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"■4j- ■ Summary: We prove a multivariate version of Hoeffding's inequality about the 

distribution of homogeneous polynomials of Rademacher functions. The proof 

^ j is based on such an estimate about the moments of homogeneous polynomials 

of Rademacher functions which can be considered as an improvement of Borell's 
inequality in a most important special case. 

1. Introduction. Formulation of the main results. 

Hoeffding's inequality states the following result, (see e. g. [2], Proposition 1.3.5.) 

Theorem A. (Hoeffding's inequality). Let Ei,...,e n be independent random vari- 
ables, P{ej = 1) = P(sj = —1) = \, 1 < j < n, and let ai,...,a n be arbitrary real 

numbers. Put Z = a j £ j an d V 2 = a 2 . Then 

3=1 3=1 
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>; P(Z > u) < exp j-^} forallu>0. (1.1) 

00 ' 
(N _ 

In the study of [/-statistics we need a multivariate version of this result. The goal 
of this paper is to present such an inequality. To formulate it first we have to introduce 
some notations. 



Let us fix a positive integer k and some real numbers a(ji, . . . ,jk) for all sets of 
arguments {ji, . . . ,j}-} such that 1 < ji < n, 1 < I < k, and ji ^ ji> if I ^ I', in such 
a way that the numbers a(j±, . . . ,jk) are symmetric functions of their arguments, i.e. 
a(ji, . . . , j k ) = a(j7r(i) 3 • • -Jnik)) for all permutations n ell k of the set {1, . . . , k}. 

Let us define with the help of the above real numbers and a sequence of independent 
random variables Ei,...,e n , P(£j = 1) = P(£j = —1) = \, 1 < j < n, the random 
variable 

z = . . . J k )e h ■■■£ jk (1.2) 

(ji,---,jk) ■ l<3l<n for all l<l<k 
31^3 1> if l ¥=l' 



and the number 



V 2 = a 2 (j 1: ...,j k ). (1.3) 



O'li—iifc): l<3l<n for all l<l<k 



Now we formulate the following result. 
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Theorem 1. (The multivariate version of Hoeffding's inequality). The random 
variable Z defined in formula (1.2) satisfies the inequality 

P(\Z\ > u) < ^exp I -- ^— J \ forallu>0 (1.4) 

with the constant V defined in (1.3) and some constants A > depending only on the 
parameter k in the expression Z . 

Let us remark that the condition that the coefficients a(ji, . . . ,jk) are symmetric 
functions of their variables does not mean a real restriction, since by replacing all co- 
efficients a(ji, . . . , j k ) by asymCft, • • • , jk) = p E «(j'tt(i), • • • , J w (fc)) in formula (1.2), 

7ren fe 

where denotes the set of all permutations of the set {1, . . . , k} we do not change 
the random variable Z. The identities EZ = 0, EZ 2 = k\V 2 hold. A comparison of 
Theorem A and Theorem 1 shows that Theorem 1 yields a slightly weaker estimate in 
the special case k = 1 because of the pre-exponential coefficient A in the estimate (1.4). 
But the expressions in the exponent agree in formula (1.1) and in formula (1.4) in the 
special case k = 1. 

Moreover, estimate (1.4), disregarding the pre-exponential coefficient A in it, is 
sharp for all parameters k > 1. To see this let us consider the random variable Z = Z n 
defined in (1.2) with the special choice 

V 

a(ji,-..Jk) =On{ju...,j k ) = =. 

\Jn{n — 1) • • • (n — k + 1) 

It is known (see e.g. [3]) that the random variables Z n converge, as n — > oo, in distri- 
bution to a random variable which can be expressed by means of a fc-fold Wiener-Ito 
integral. Moreover, it can be expressed in a more explicit form as the distribution of 
V ■ Hk(rj), where rj is a random variable with standard normal distribution, and £/&(•) is 
the /c-th Hermite polynomial with leading coefficient 1. Beside this, the tail behaviour 
of H k (rj) is similar to that of r] k in a neighbourhood of the infinity. Hence the above ex- 
ample shows that if we have no additional restriction about the coefficients a(ji, . . . , jk) 
of the random variable Z, then the estimate (1.4) is essentially sharp. We cannot write 
a better expression in the exponent of its right-hand side. This problem is discussed in 
more detail in a more general context in Example 2 of paper [5] . 

Theorem 1 can be interpreted in such a way that the distribution of Z satisfies 
an inequality similar to the distribution of Vn k , where rj is a standard normal random 
variable. We shall prove it as a relatively simple consequence of the following result, 
which formulates a similar statement about the moments of the random variable Z. 

Theorem 2. The random variable Z defined in formula (1.2) satisfies the inequality 

EZ 2M < 1 ■ 3 • 5 • ■ • (2kM — 1)V 2M for all M = 1,2, .. . (1.5) 
with the constant V defined in formula (1.3). 
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We shall prove Theorem 2 with the help of two lemmas. Before their formulation 
we introduce the following notation: 



z = Hju---Jk)\Vh---Vj k , (1-6) 

(ii,...,j fe ): 1 <j;<n for all l<l<k 
jljtj,, if I/I' 

where rji, . . . ,rj n are iid. random variables with standard normal distribution, and the 
numbers a(ji, . . . ,jf~) agree with those in formula (1.2). Now we state 

Lemma 1. 

EZ 2M <EZ 2M for all M= 1,2,..., (1.7) 

and 

Lemma 2. The random variable Z defined in formula (1.6) satisfies the inequality 

EZ 2M < 1 ■ 3 • 5 • ■ • (2kM — 1)V 2M for all M = 1,2, .. . (1.8) 

with the constant V defined in formula (1.3). 

Theorem 2 states an estimate about the moments of homogeneous polynomials of 
the independent random variables £i, . . . ,s n which are sometimes called Rademacher 
functions in the literature. We finish the Introduction by recalling BorelPs inequality 
(see e.g. [1]) which gives a similar estimate. The proof of the results will be given 
in Section 2. Then we compare Borell's inequality with our results and make some 
comments in Section 3. 

Theorem B. (Borell's inequality). The moments of the random variable Z defined 
in formula (1.2) satisfy the inequality 

(_ 1 \ kp/2 
^—jj (E\Z\ q ) p/q if Kq<p<oo. (1.9) 
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2. Proof of the results. 

Proof of Lemma 1. We can write, by carrying out the multiplications in the expres- 
sions EZ 2M and EZ 2M , by exploiting the additive and multiplicative properties of the 
expectation for sums and products of independent random variables together with the 
identities Eef +1 = and Er] 2k+1 = for all k = 0, 1, . . . that 



EZ 



2M 



E 

j 1 ,...,j u m 1 ,...,m u l<j 3 <n 
j s >l, l<s<l, miH \-mi=M 



A(ji 



,3i,m 1: 



mAEe 



2mi 

h 



(2.1) 



and 



EZ 2M = 



E 



B(ji, ■ ■ ■ , ji, mi, . . . , m^Eri 2 ™ 1 ■ ■ ■ Erf™ 1 



(2.2) 



ji,---,jl,m 1 ,...,mi, l<j s <n 
j s >l, l<s<l, miH hmi=M 



with some coefficients A(ji, . . . ,ji,m\, . . . ,mi) and B(ji, . . . ,ji,m\, . . . ,mi) such that 

|A(ji,...,j z ,mi,...,m z )| < B(j ll ...,ji,m ll ...,mi). (2.3) 

We could express the coefficients A(-,-,-) and i?(-,-,-) in an explicit form, but we do 
not have to do this. What is important for us is that A(-, •, •) can be expressed as the 
sum of certain terms, and £>(•, •, ■) as the sum of the absolute value of the same terms, 
hence relation (2.3) holds. Since Ee 2m < E r q 2m for all parameters j and m formulas 
(2.1), (2.2) and (2.3) imply Lemma 1. 

Proof of Lemma 2. Let us consider a white noise W(-) on the unit interval [0,1], i.e. 
let us take a set of Gaussian random variables VF(A) indexed by the measurable sets 
A C [0, 1] such that EW(A) = 0, EW(A)W(B) = X(AnB) with the Lebesgue measure 
A for all measurable subsets of the interval [0, 1]. (We also need the relation W(AUB) — 
W(A) + W(B) with probability 1 if A fl B = 0, but this relation is the consequence 
of the previous ones. Indeed, they yield that E(W(A US) - W(A) - W(B)) 2 = 
if A n B = 0, and this implies the desired identity.) Let us introduce the random 
variables rjj = r^^W ( [^-, ^)) , 1 < j < n, together with the function f{t\, . . . , tk), 
with arguments < t s < 1 for all indices 1 < s < k, defined as 



n 



k/2 



1 3s 



n 



n 



, — J , and j s ^ j s > if s ^ s\ 



/(*!,..., tfc) = < 



1 < j s < n, 1 < s < k 



ift, e 



— , — ) , and j s = j s ' for some s ^ s', 

n n 



1 < is < 1 < s < A; 



(2.4) 

Observe that the above defined random variables 771 , . . . , r\ n are independent with 
standard normal distribution, hence we may assume that they appear in the definition 
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of the random variable Z in formula (1.6). With such a choice we can represent Z in 
the form of a fc-fold Wiener-Ito integral (introduced e.g. in [4]) 



Z = J f(t 1 ,...,t k )W(dt 1 )...W(dt k ) 



of the (elementary) function / defined in formula (2.4) with respect to white noise W(t) 
we have introduced. Beside this, the identity 

f(t 1 ,...,t k )dt 1 ...dt k = V 2 

also holds with the number V defined in formula (1.3). Hence to complete the proof of 
Lemma 2 it is enough to show that if a function f of k variables and a cr-finite measure 
li on some measurable space (X, X) satisfy the inequality 



/ 



f 2 (xi, . . . ,x k )jJL(dx\) . . ./j(dx k ) = a 2 < oo 



with some a 2 > 0, then the moments of the fc-fold Wiener-Ito integral (defined e.g. 
in [4]) 

■//*,*(/) = J f(x 1 ,...,x k )n w (dx 1 )...iJ W (dx k ) 

of the function / with respect to a white-noise nw with counting measure \i satisfy 

the inequality E {k\J^ k {f)f M < 1 • 3 • • • (2kM - l)a 2M for all M = 1, 2, But this 

result (which can be got relatively simply from the diagram formula for the product 
of Wiener-Ito integrals) is proven in Proposition A of paper [5] , hence here I omit the 
proof. 1 

Theorem 2 is a straightforward consequence of Lemmas 1 and 2. Hence it remained 
to prove Theorem 1 with the help of Theorem 2. 

Proof of Theorem 1. By the Stirling formula we get from the estimate of Theorem 2 
that 

EZ2M £ ¥%wh y2M £ A (CT < kM ) iMv2M 

for any A > \pl if M > Mq{A). Hence we can write by the Markov inequality that 

^ trv s EZ 2M JlkM (V\ 2/k \ kM 



1 For the sake of completeness I put the proof of this result together with some definitions needed 
to understand it to an Appendix of this paper, but probably it will not belong to the final version of 
this work. 
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for all A > V% if M > M (A). Put kM = kM(u) = \ (f ) 1 , and M = M(u) = [M], 
where [x] denotes the integer part of the number x. Let us choose a number uq by the 
identity M(u ) = M (A). Formula (2.5) can be applied with M = M(u) for u > u , 
and it yields that 

P(Z >u)< Ae~ kM < Ae k e~ kM = Ae fc exp |-i f^)^} if u > uo- (2.6) 

Formula (2.6) means that relation (1.2) holds for u > uq if the constant A is replaced by 
Ae fc in it. By choosing the constant A sufficiently large we can guarantee that relation 
(1.2) holds for all u > 0. 

3. A discussion about the results. 

Let us look what kind of estimate yields Borell's inequality for the expression Z defined 
in (1.2). It is natural to apply it with the choice q = 2. Since EZ 2 = k\V 2 , Borell's 
inequality yields with such a choice the estimate E\Z\ 2p < (2p — l) kp {k\V 2 Y for all 
real numbers p > 1. Let us compare this inequality for the moments EZ 2M with large 
integers M with the estimate of Theorem 2. If we disregard some constant factors not 
depending on M we get that this estimate is of order (2M) kM V 2M ■ (/c!) M , while Theo- 
rem 2 yields an estimate of order (2M) kM V 2M ■ (^) kM . It can be seen that k\ > (^) k for 
all k > 1. This means that Borell's inequality shows that EZ 2M < C M {kM) kM V 2M for 
large M with a universal constant C depending only on the parameter k in formula (1.2), 
but it does not give the optimal choice for the parameter C. As a consequence, it implies 

a weakened version P(\Z\ > u) < Aexp j— B (^) 2 ^| of the inequality of Theorem 1 

with some universal constants A and B, but it cannot yield the optimal choice for the 
number B. In short, Theorem 2 is weaker than Borell's inequality in that respect that 
it compares only the second and 2M-th moment of the random variable Z, but it yields 
a sharper bound. Hence it can be more useful in certain applications. 

Let us finally remark that actually we have proved a sharper result than Theo- 
rems 1 and 2. In those results we have defined the random variable Z with the help of 
independent random variables Sj with distribution P(sj = 1) = P(sj = —1) = |. But 
the proof of Theorems 1 and 2 also works without any change in the case of random 
variables with other distributions. Let us formulate this result. First I introduce the 
following notion. 

Definition of sub-Gaussian distributions. Let us call a random variable £ or 
its distribution sub- Gaussian, if its moments satisfy the relations E^ 2M ~ X = and 
E£ 2M < En 2M for all M = 1,2, ... , where n is a random variable with standard nor- 
mal distribution. 

It is clear that a random variable with distribution P{e = 1) = P(e = —1) = | is 
sub-Gaussian. Because of some symmetrization arguments applied in probability theory 
this seems to be the most important example of sub-Gaussian random variables, but 
the following result holds for all of them. 
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Theorem 3. Let e±, . . . ,e n be independent sub-Gaussian random variables (with possi- 
bly different distributions). Let us define the random variable Z by formula (1.2) by the 
replacement of the original random variables S\, . . . ,e n with these new random variables 
Ei,...,e n . This new random variable Z also satisfies the estimate (1-4) of Theorem 1 
and the estimate (1.5) of Theorem 2. 

Theorem 3 means that the distribution and moments of homogeneous polynomials 
of independent sub-Gaussian random variables satisfy such estimates as the distribution 
and moments of homogeneous polynomials of Gaussian random variables. Here the sub- 
Gaussian property plays a most essential role. In the case of homogeneous polynomials 
of independent, but not necessarily sub-Gaussian random variables the situation is much 
more complex. But this problem will not be discussed here. 

Appendix 

To prove the inequality formulated at the end of Lemma 2 we need a result which 
expresses the expected value of the product of multiple Wiener-Ito integrals in an 
appropriate way. To formulate this result which is the simple consequence of a basic 
result of the theory of Wiener-Ito integrals, the so-called diagram formula, first I have 
to introduce some notations. Let me recall that given a cr-finite measure fi on some 
measurable space (X, X) we call a white noise with counting measure fi such a Gaussian 
field /j,w(A), A G X, indexed by the measurable sets of X which satisfies the relations 
Efx w (A) = and Efx w (A)fu w (B) = fj,(A n B) for all A,B e X. 

Let us have a cr-finite measure \x together with a white noise fiw with counting mea- 
sure fi on (X, X). Let us consider L real valued functions fi(x±, . . . , x kl ) on (X kl , X kl ) 
such that f ff(xi, . . . , x kl )fji{ dx\) . . . fi{ dx^) < oo, 1 < I < L. Let us consider the 
Wiener-Ito integrals k t \ J M)fci (//) = / fi(xx, . . . , x kl )fi w { dxi) . . . fx w ( dx kl ), 1 < I < L, 

and let us describe how the expected value E (^J\ W-J^kiifi)^ can be calculated by 
means of the diagram formula. 

For this goal let us introduce the following notations. Put 

L 

F(x(tj),l < I < L, 1 < j < h) = Y[fi(x (lA) ,...,x (lM) ), (Al) 

i=i 

and define a class of diagrams r(fci, . . . , k]f) in the following way: Each diagram 7 e 
r(fci, . . . , ki) is a (complete, undirected) graph with vertices 1 < I < L, 1 < 

j < h, and we shall call the set of vertices with a fixed index I the l-th row of 
the graphs 7 G T(ki, . . . , ki). The graphs 7 G T{k\, . . . , kjf) will have edges with the 
following properties. Each edge connects vertices (l,j) and from different rows, 

i.e. I 7^ I' for the end-points of an edge. From each vertex there starts exactly one edge. 
r(fci, . . . , kjf) contains all graphs 7 with such properties. If there is no such graph, then 
r(fci, . . . , kjf) is empty. 
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L 

Put 2N = ki. Then each 7 G T(ki, . . . , k^) contains exactly N edges. If an edge 
1=1 

of the diagram 7 connects some vertex with some other vertex V > I, then 

we call (l',f) the lower end-point of this edge, and we denote the set of lower end-points 
of 7 by Ay which has N elements. Let us also introduce the following function « 7 on 
the vertices of 7. Put a 7 (Z, j) = (/, j) if (/, j) is the lower end-point of an edge, and 
0£~f(l,j) = if is connected with the point (l'f) by an edge of 7, and is 

the lower end-point of this edge. Then we define the function 

F 7 (x (Z)j) , (I J) G Ay) = F(x a ^ lJh l < I < L, 1 < j < h) 

with the function F introduced in (Al), i.e. we replace the argument by in 
the function F if (/, j) and (l',j') are connected by an edge in 7, and V > I. Then we 
enumerate the lower end-points somehow, and define the function _B 7 (r), 1 < r < N, 
such that -B 7 (r) is the r-th lower end-point of the diagram 7. Write 

F 1 {x 1 ,...,x N ) = F 1 {x B ^ {r) , l<r<N) 

and 

F 1 = J ' J F~(( x ii • ■ • > x n)^{ dx±) . . . n{ (Ixn) for all 7 G r(fci, . . . , /cl). 

Now we formulate the corollary of the diagram formula we need. 
Theorem B. With the above introduced notation 

\/=i / 7er(fci,...,fci) 

(If r(fci, . . . , kh) is empty, then the expected value of the above product of random inte- 
grals equals zero.) Beside this 

L 

F i - II / /; 2 ( Xl ' • • - j^fcJM^i) • ■■v{dx kl ) for all 7 G T(k 1 , . ..,k L ). 
1=1 J 

Now we turn to the proof of the inequality 
E (k\J^ k {f)f M < 1 • 3 • 5 • • • (2kM - 1) ^ f( Xl , . . . , x k )fi( d Xl ) ...//( . (A2) 

Proof of Relation (A2). Relation (A2) can be simply proved with the help of Theorem B 
if we apply it with L = 2M and the functions fi(xi, . . . , x kl ) = f(x±, • • • , x k ) for all 
1 < I < 2M. Then Theorem B yields that 

E(k\J^ k {f) 2M ) < (^J f 2 (x 1 ,...,x k ) f i(dx 1 )...v(dx k ) S J \T 2M (k)l 



8 



where |r 2 M(^)| denotes the number of diagrams 7 in T(k, . . . , k). Thus to complete the 

2M times 

proof of relation (A2) it is enough to show that |T2m(&)| < 1 • 3 • 5 • • • (2kM — 1). But 
this can be seen simply with the help of the following observation. Let T2m(^) denote 
the class of all graphs with vertices 1 < I < 2M, 1 < j < k, such that from all 

vertices (/, j) exactly one edge starts, all edges connect different vertices, but we also 
allow edges connecting vertices and with the same first coordinate /. Let 

|r2M(fc)| denote the number of graphs in T2M(k). Then clearly |T2m(^)| < \^2M(k)\. 
On the other hand, |T2m(^)| = 1 • 3 • 5 • • ■ (2kM — 1). Indeed, let us list the vertices of 
the graphs from T2m(^) in an arbitrary way. Then the first vertex can be paired with 
another vertex in 2kM — 1 way, after this the first vertex from which no edge starts can 
be paired with 2kM — 3 vertices from which no edge starts. By following this procedure 
the next edge can be chosen 2kM — 5 ways, and by continuing this calculation we get 
the desired relation. 

References 

1. ) Borell, C. (1979) On the integrability of Banach space valued Walsh polynomials. 

Seminaire de Probabilites XIII, Lecture Notes in Math. 721 1-3. Springer, Berlin. 

2. ) Dudley, R. M. (1998) Uniform Central Limit Theorems. Cambridge University 

Press, Cambridge U.K. 

3. ) Dynkin, E. B. and Mandelbaum, A. (1983) Symmetric statistics, Poisson processes 

and multiple Wiener integrals. Annals of Statistics 11, 739-745 

4. ) Major, P. (1981) Multiple Wiener-Ito integrals. Lecture Notes in Mathematics 

849, Springer Verlag, Berlin Heidelberg, New York, 

5. ) Major, P. (2004) On a multivariate version of Bernstein's inequality. Submitted to 

Ann. Probab. 



Abbreviated title: Hoeffding's inequality 



9 



